A rule-based approach to farsi language text-to-phoneme conversion

نویسندگان

  • Mohammad Reza Sadigh
  • Hamid Sheikhzadeh
  • M. R. Jahangir
  • Arash Farzan
چکیده

A conversion from orthographic (written) form to a phonetic transcription is the first stage in a text-to-speech system. In this study, algorithms are presented to facilitate the text-to-phoneme (TTP) conversion for the Farsi language. Using a lexicon of about 15000 base morphemes, word formation rules are investigated and implemented. Moreover, a word segmentation of the written sentence has to be done prior to any phonetic transcription of the text. Due to special form of Farsi orthography, the word segmentation process is a complicated one. To solve the problem, a fast and on-line algorithm and a more complicated off-line algorithm are presented. The overall performance of the TTP conversion is evaluated to be more than

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grapheme to phoneme conversion: an Arabic dialect case

We aim to develop a Speech-to-Speech translation system between Modern Standard Arabic and Algiers dialect. Such a system must include a Text-to-Speech module which itself must include a Grapheme-to-Phoneme converter. Algiers dialect is an Arabic dialect concerned by the most problems of Modern Standard Arabic in NLP area. Furthermore, it could be considered as an under-resourced language becau...

متن کامل

Rule-based Korean Grapheme to Phoneme Conversion Using Sound Patterns

Grapheme-to-phoneme conversion plays an important role in text-to-speech applications and other fields of computational linguistics. Although Korean uses a phonemic writing system, it must have a grapheme-to-phoneme conversion for speech synthesis because Korean writing system does not always reflect its actual pronunciations. This paper describes a grapheme-to-phoneme conversion method based o...

متن کامل

Grapheme-to-Phoneme conversion, a knowledge-based approach

This paper reflects the results of an ongoing project at Högskolan i Skövde, aimed at the creation of a system for grapheme-to-phoneme conversion for Swedish, from a knowledge-based approach. The focus lies on development and implementation of an algorithm for parsing ortographic text, and phonetic rules for the transcription.

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

A Semantic Approach to Person Profile Extraction from Farsi Documents

Entity profiling (EP) as an important task of Web mining and information extraction (IE) is the process of extracting entities in question and their related information from given text resources. From computational viewpoint, the Farsi language is one of the less-studied and less-resourced languages, and suffers from the lack of high quality language processing tools. This problem emphasizes th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000